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BOUNDS AND RATES OF CONVERGENCE FOR THE EXTENDED COMPOUND 


ESTIMATION PROBLEM IN THE SEQUENCE CASE 


τν Introduction and Summary. 


A The problem. 


Let 680 - (0 ... 0 , ^. ) be a countably infinite vector 


1 60) n 


whose components O; are elements of some finite interval Q of the 
Real line. Let > = ce θε Ω) be for some measure U a family of 
known probability density functions with parameter 6. Let x, be a 


real valued random variable with density Po (°). Suppose the vector 6 
i 
is unknown and for each i it is desired to estimate PE The estimates 


are to be made in sequence and the estimate of O; may be based on the 
independent observations + | τρ το hus for cach 


i- l1, 2, ... , & non randomized estimator 9. (X.) is sought for ZE 


where X, is the vector of observations νο»... X.). It is assumed 


1? 
that at each stage of this estimation problem one suffers squared error 
loss, so that if 9. is the estimate of 6, a loss of CP; - 8 1s 


units is suffered. The risk of an estimator 9. is defined to be the 


expected loss, that is E[ (9. (X ) - DDR The average risk for n 


İ 
ik 


d 
estimations becomes = » E[ (9, (X.) - 9.171. One would like to find, 
j=l 


for specified & and ï, a decision procedure 9 = OE Pas oee ) which, 


on the basis of its average risk for the first n estimations, is in 


some sense optimal for large n.e 








One way in which such a problem could arise is as follows: suppose 
the Navy wishes to screen all new recruits and to classify them on the 
basis of their "natural aptitudes’ to be radar technicians. In an 
attempt to do this, each recruit is given a test whose outcome can be 
represented as a number. Suppose also that "natural aptitude" can be 
represented on a numerical scale. On the basis of prolonged testing and 
evaluation in the past, the Navy has been able to fit a good probability 
distribution model for the outcome of a person's test score given his 
"true" aptitude as a parameter. The Navy now wants to estimate each new 
recruit's aptitude on the basis of his test score. While squared error 
loss is somewhat artificial, it is clear that the more the Navy errs in 
estimating a recruit's aptitude the greater the loss it suffers, and 
squared error loss is a convenient way to represent this. In this 
example it is also apparent that many decisions will be made, and from 
the Navy's point of view, the average risk incurred is a reasonable basis 
upon which to judge the "optimality" of a decision procedure. In this 
example, then, 6, would be the a recruit's true aptitude and X, 
would be his test score. 

In the preceding example it is not unreasonable to assume each 
recruit's aptitude is independent of all other recruits' aptitudes. An 
example will now be presented in which it is not unreasonable to suppose 


' Suppose a Navy anti-submarine 


the 6. 's would occur in "patterns.' 
group is on patrol duty to guard against submarine penetration. It is 
necessary, in deciding what type of patrol to carry out, to have an 


estimate of the average sonar detection range. This range will depend 


upon many different factors such as sea temperature and salinity, as 





well as the sonar equipment involved. Suppose a test is conducted every 
few hours whose results follow reasonably well a known probability dis- 
tribution with the true average detection range as a parameter. In this 
example then, 0: would be the true detection range and x. the test 


result. One would not expect 6, and oe Lo be "πίει, however, 


al 
as the conditions fixing their value, while changing, are changing more 
or less continuously in time and a high value of 9: would tend to mean 
& high value of 914 as well. In this example, as in the previous one, 
& decision about the true detection range will be made many times, and 


the average risk is a reasonable criterion to use in evaluating a decision 


procedure. 


Bs Known results. 

The problem of finding a good estimator is really twofold. First 
some standard of optimality must be established, and secondly a procedure 
must be found which yields good results according to this standard. 
Samuel [11] has considered the following standard. Fix EN Let G+) 
be the empirical distribution function of oe That is 


G (x) == 


= (Che number or i such that 9. S xs) 


Let (8,5 i= l; , n) be mutually independent identically distributed 
random variables with a priori distribution function Ge If we now con- 
sider X; to be an observation of a random variable with the conditional 
density function P. given that 8, = Zt then the usual Bayes argument 
gives , (X,) = EL, |X, ] as the estimator achieving the minimum Bayes risk 


R(G ). Of course this procedure does not apply to the compound estimation 








problem since G is unknown and in any case the 6, are not obser- 
vations of random variables. Nevertheless Samuel has shown R(G ) is 
an "optimal" standard to use in evaluating a procedure in the 
following sense: Let R (9, 9) denote the average risk for the first 
n decisions incurred by a decision procedure 9 against a parameter 
vector @. Then R(G, ) is an "optimal" standarà in that if one considers 
only the class of "obvious" procedures (p: 9,(X,) = OX, ) 
πι caen wen R (9, 9) 2 R(G ). In other words if one 
bases his decision about 6, only on the observation having 6, as a 
parameter and uses the same rule for each i, one can never achieves? 
lower average risk than the number R(G ). 

Samuel also gives several sufficient conditions on Q, (pg: θε}, 
and ® which ensure that for each fixed ϐ 

Tim (R (9, 9) - R(G,)) < 0 

n- oo 
and in several cases she exhibits specific procedures which satisfy the 
above condition. 

Robbins [ 6] [7] [ 8] anà Johns [ 5] have done work in the related 
empirical Bayes problem (see Chapter III, Section G) and many of the 
decision procedures they derive are also "optimal" in the compound deci- 
sion problem. Extensions of their estimators will be used in later 


sections. 


er Summary of new results. 
As mentioned in Section B it is first necessary to establish a 


reasonable standard of "optimality" to use in evaluating a particular 








decision procedure. Many reasons have been advanced in the literature 

ποπ e onsidering the rYsk E[ (9; - 0.) to be a good indication of how 
well a particular decision rule does. In the compound decision problem, 
it seems even more reasonable to consider the average risk R (9, 8) as 

a reliable index to be used in evaluating a particular decision procedure 
®, and this is the index adopted in this paper. A standard R(8,) is 

now needed such that if for all 0 and n R(P > 2) is no greater than 
R(8 ); one would be willing to say q is a good decision procedure. 
Samuel has given good intuitive reasons for selecting Β(Θ.) = R(G, ), and 
has made the statement [11] that βία.) cannot, in the limit, be improved 
upon. Based on an idea of Johns [5], a sequence of more stringent stand- 
ards (R (8. ): k = 1, 2, ... ) will, however, be obtained in this paper such 
that Β (6) = R(G_); ΒΙΟΥ ο εἴπει ἘΠ... anders seer 

R (8) = ο tm, 8) + h(k, n, 0) where f(k, n, 0) >0 and 
msn, g) = 0(=) πο.  , ΠΠ ΟΙ,» 

5) is in fact strictly positive. R (8) will be shown to be 

the minimum Bayes risk possible if in fact 9, is à realization of an 

n dimensional random vector whose last k components are independently 
distributed from the first n - k components according to the k 
dimensional empirical distribution function generated by e The ex- 
tended compound estimation problem is defined to be the problem of finding 
procedures which asymptotically achieve these standards. The analogous 
problem in the empirical Bayes case is being considered by Barndorff- 
Nielsen [1]. To make these statements more explicit several definitions 


are needed. These definitions will be used throughout the paper. 








Def. 1) Let 2 be a bounded interval of the real line. Let 
o s (Dg: Oc 0] be a family of probability density functions with respect 
to some measure H. Let {6.: on η, ο»... Ρε an arbitrary 
sequence. Let (X. : j =1, 2, ... } be a sequence of mutually inde- 
pendent real valued random variables with = distributed according to 
pr y Or ee ) where 


OEO i= Ls . Let er be the vector consisting of the first 


Pg e Let = = (X j 999 0 x). Let 6 — (6. , 8 


n components of 0. 


Def. 2) VO ny k= I ο. n he e order empirical dis- 


teroution function of [oM is: 


k 
Gy, Yos «0 Y) " VT L 


($ of j (k<j<n) such that: 9 πα ο. 


j-ktZ = y, 


When k= ] this definition yields the usual empirical distribution 
funetion. 

Let k and m be fixed arbitrary positive integers k « m. Let 
(8, : i=1, ... , m} bea sequence of random variables with range space 


2. Let O εν 6 have an a priori joint distribution function 


m-k+1? 
G and assume the remaining 94 are distributed independently of 9 
Let (X, : i= 1, ... , m) be a sequence of random variables with 


conditional density functions Po given O, = O, such that the X. 
jr 

are mutually conditionally independent given the 9i. For estimating 

the realization a OL o, it is well known that the estimate 


ο > ον als X d» which depends only on the last k observa- 


tions, is a Bayes estimate and achieves the Bayes risk R(G). 





Def. 3) ο Πο νο eo el y cH 
k ku Ch m ; 
let R (9 ) - Βία.) where Ga is the k order empirical distribution 
function of 9. Thus R (8 ) is the Bayes risk for Cae 
-n -n n. 


Using the above definitions it will be shown in theorem 1) that 


f(x, n, 8) = E([E(Q 1, ,,) - Σί8 BR -.0. > NEN 


where SE ees Ok] have the a priori joint distribution function ο 
It is clear that fi is always non negative and will equal zero only if 
ο ο] = Hep +» » X,,] with probability one. This 
condition is clearly satisfied for most Ὁ only if Da generates an 
empirical distribution function Get such that 0, and Ql are 
independently distributed. It is not unreasonable to suppose that "few" 
arbitrary sequences, occurring in situations leading to the compound 
decision problem, will satisfy this condition, even as n approaches ©. 
Another necessary condition for DE E Eee. "rr. A e 

is that the sample serial correlation coefficient lag k +1 of 

(0.: i = l, ..., n] be zero. Again it seems unlikely that many 
sequences of O; would have this property, especially for small values 
of k. In particular if 0 has repeated "patterns" of length greater 
than k, neither of these conditions would be expected to hold. 

Accepting m (0 ) as a standard to be used in evaluating a decision 


procedure ®, attention is turned to constructing procedures for specific 


classes 9, and to evaluating these procedures. 








Def. 4) Let e*(9, 8) = R (9, 9) - R. (9, )- Thus εἴίῳ, 8) 


represents the difference after n decisions between the average risk 
attained by a particular decision procedure and the ee standard. 

For many important classes 3, including the normal, gamma, a 
discrete exponential family, and a "non-parametric" class, decision 
procedures g will be found and an upper bound B(k, n) will be given 
such that ε"(φ', ϐ) « Βίκ, n) for all 0c and such that 
lim B(k, n) » O. For the discrete exponential family, which includes 


n3 oo 
the geometric, negative binomial, and Poisson families, it will be 


lo κ n 
win 
n 


improvement over those obtained by Samuel [11] who considered only the 


shown that B(k, η) Ξ 0 . These results represent a considerable 








case k = 1 and showed 


im [R (9, 6€) - R,(8.)] <0 


n- o 


for any fixed Y ina parameter space more restricted than that con- 
sidered in this paper. If Y is the class of binomial probabiiity 


density functions, a decision procedure is obtained which attains a 


lower average risk than previously known procedures, and 0-26 2 is 
n 








obtained as the rate of convergence of this risk to its "standard." 








iieorerelLiminary Results. 


ας 3 et = + 
In this chapter we shall first prove that R (8) Dos 
f(k, n, 9) + h(k; n; 0) where f and h have the properties stated 
in Chapter I. We shali then develop a general theorem and corollary 
which will enable us to obtain specific decision procedures in Chapter III. 


Finally we shall prove several lemmas which will be useful in Chapter III. 


Def. 5) Let a= "T Yos eee 5 De be an arbitrary vector. 


p ME oe . 
For k<n we define y = Su Ye E 


Def. 6) vo, 0, Vk, n such that 1<k<n let 


k 
1 n 


@ (x) =—————— y | > (x,) 
OK n- krl EN T 2 


1 n k 
Ha). — 5 Σ 9, μ » ον 
= = + 
n Ik sa tal E J gu um 4 
While both Q and Q* are functions of several variables which are 
not explicit in the notation, it will always be clear in context what 
arguments are intended. We note that Q(x) is the unconditional 


density function of a random vector Σι if the parameters ϐ sa 


Τ᾽ k 


are assumed to be random variabies 9. 5 ceo 3 a. with a priori distri- 


Daco function gt, 








ΠΥ, θα. Υπ sueb*that t « ks e 


n k 
De |» (565) 
τ. πι. CENE ο 
QS on n 
2 Po (Χρ) 
j=k £-l j-kté 
k 
Vx) - 
0 otherwise k 


Let m and n be integers such that 1<sk<sm and n<o, then 
K k,,Ky2 
ve (Xe) is one version of s[e |x*], and R (9) = E{(@ - vx) ] 


2 k 2 
= Ble - (ye(X*))°1. 
Dea O Y k, J such that I< k< et 
κι ο... 
Fo Lo, 9,1 Ε[{ΦίΧ,) 3) ] 


where (-) is an arbitrary non randomized estimator with a 


k-dimensional argument. 


n 
1 κ 
az niet. Bla, 2) er 2 8 


R(o, 9 ) is then the Bayes risk in using the rule P(X) as an estimate 
i ιο d x = Ἔ 
of e. when €: s e have the a priori distribut:on G 


We now compare rR,(9,) with R C» and prove: 


tu 


Theorem 1) v9, 0, k; n 1<k<n<o R (8) ~ eet η, π, ἘΠ᾿ 
I τ 
h(x, n, 9) where f(k, n, 9) >0 ana |h(k, n, 9)| = o|- j uniformly 


ιο 


1O 








Proof: 


Let, E, tr] Teler ToJexDectalion Wien respec Lo = and ΤΙ 


refer to expectation with respect to ας” id For any estimator (X, ,.) 


R(9(X, 1), 9.) = E (9X, ,.) - € 71 
FAC 
Εθν] em r (e Sr aa)! 


+ BEG 4) Kea) + 905,1 


which of course is minimized for 9(X,,,) = E,[@,,,/X,,,]. Letting 
(X, i.) = gia [X] we obtain 
aon i ee - R (9) 3 R(E Le... D, 3, 9.) 
pop m Br u] 
Let f(k, n, 9) = B (IES Caa Sea) - EQ ,, 16,11 


h(k, n, 8) = R (9) - R(E,[8,,, [X*,,], 9.) 


Then it remains only to show In(k, n, 8)]| = (=| uniformly in” 5. 


since 


Relea) = 5,16) - 5,(5,19, Ix, D) g 
e 8.) - E [6] - Βρί Ee Ne 





and since N is a bounded interval, say 0c Q = la] <B<« we have: 


Ix, n, 6)| « IE (6) - (8,1 * Is Gf Le Dx.) - ECL, , | ll 











and 
n n 
Be) E (e a E Ds 
l'"k as k+l n-k>+1 P y a 3 
n 
5 1 f 2 2 
Sia, > SD) a J t(n k) 6, 
2 
B 
= ne ix 
Also 
2 
a 0 E fE CA 
2 
n 
i He f= τ. ο κ κό j=k £=1  "3-k+£ 
B > ΕΤ 
k Ῥ (x,) 
A j-k 4-1 n : 
: k ^ n X 
TE (x,) Σ. |... ee) 
τα | £=1 kl j=k+1 £=1  “3-k+£ ulax ) 
n-k =k 
T. T 
T" =] ne -k+£ 


For fixed Xy the expression inside the braces may be written as 


a ta 3 bo eb a E b 
e gcc heus n 
lis a n-k+1 a E | 

n n 


Alle 














n k k 
where an = | > E l| Po (x,) ge O, || Ῥρ (x) 
j=k+1 f=] j-k+2£ = 4 
n k k 
m Σ N ΕΝ (x) bz |! Po (x, 
j=k+1 4-1  "3-k44 £=1 4 
and 
a ta : O a E b 
ee EAN «Σ 
Γον Db neck ct d b n- k 
n n 




















(n - k)b (a, + a - (n-k+ ' + Ὁ) α΄ 


= (n- k)in-k+1)(b. + b)b 
n n 





an 
Den n 


! 

| 

| Fr ΠΠ. + b)b 
n n 


(n - k)(2b aa t b E - b a) -ba 





IA 


2a a 2 b an 
Be. ο LA e n 
n- k + τή ο eect a) (b + b)b } 

El n n n 





dE 
+ 
(n- k)([n- k * 1 


b E 
mx 











LR o 
Ό SEIS 
n 


From the definition of a bo a, and b it is clear that 


I2|«B El<sB b >0 b»0 πο παῖ 
DJ- ike = 





2a a 2a a a x o 
ara opa lol a 
n n n 
































12 





. 


























2 2 
an a 2 
< 
b HDi 2 SE E 
n b 





Thus we have 


pa, (5,18, 12,1) E (EL O Zeer | 





2 B^(b. + b) 
Β΄ Ὁ 
s Jl + AD a) 
pk 


rat (ih, (x) ) taz) 


- 
Meere) I i np Gu) ua) 


RR 
I 
Y ; Po (x,) 
B j=k+l f=1 " j-k+2 
+e | n-k (dx) 
RR 
ice E: „2 


“hek+i -im k+ n-k*1^* 


14 





TBO 


Hence Ih(k, n5 ϐ) | απ : 


ο ου 
The implications of theorem 1) were discussed in Chapter I. We now 


state and prove a generalized form of a lemma of Samuel [11]. 


Lemma 1) YO, k >1, n>k 


1 τ k 2 
ΠΤΙ BL VICK) - 8)°1 S R3) 


Proof: Fix 6, k, and n. Using the expressions Fi, 02) and 
k y pa ἢ k 
R(V,, 9.) given in definition 8), and observing that RCV, . 49 το ο) 


we have: 


n 


1 k 2 
ΓΙ D Bl (Vs (5) - 037 


n 1 qp 
κ k sab NR 
"uL Σ τινὶ, 2) - Σ τὶ, 2) 





wo Sd ee a a 

n-k A l7 AS i? i-k 
i pu k k k 

---ππι ΜΟΙ Oe) NM 


isk 
« R(S, 6.) - R.(0 ). 


QED: 


12 








Def. 9) A decision procedure Q^ = (9, (3), 95 (X), — 


Q (X ), ... ) is asymptoticaily optimal of pm order if 


ο Ἡ 


vs Tm [2 5 mgt - 07) ate) so. 
qb 


n 0 


sIr 


E {sup 


n> 0 8 


n 

k 2 Ic: 
2 EKOA - P ] - Β (61) TT then 9 is 
i=1 
uniformly asymptotically optimal of ee order. 


We shall now state and prove a general theorem, which with its corollary 


will enable us to obtain the results of Chapter III. 


Theorem 2) Ὁ bounded interval N = [Q, B], family of densities Y, and 
acer Κ᾽, let 9 be a decision procedure fsuüuch tharm nS 
PLO, (X,)e2 ] = l. Suppose there exist non negative functions ELO, αι); 


C,(8, x,), and a,(@) such that y9, x, i>k 


a) PL|Os(X,) - Ve(x,)| >8,(8 AD =x%]1<€,(8, χι) 


Ὁ) im [i D Bis (e ED * to XDIe GOD 2 e, (71 


n- o 


M 


+= P(q, (X5) < 2, (2)i} = 0 


κ 


1 


uniformly in ϐ 


where the functions A; and v are as given by definitions 6) and 


7). 


Then the decision procedure g“ is uniformly asymptotically optimal 


zh 
or k order, and moreover 


16 


2 n 
(gt, 8) ο E ate =o) 2 Ele, + (B - a)b lo, >a] 
2H 
B m y Plq, <a] 


i=k 
κ, κ : à Ium 
where Elo 339) is given by definition 4). 


Proof: Clearly it is sufficient to prove the upper bound for e*(g^, ϐ) 


is correct. 


We first represent et as a sum of several functions, and then examine 
each of these functions. We shall consider k and n fixed. 

Let: 
k-1 


1 


5 Im^ 


E (g^, 0) = (EL(ps - 0,7 - R,(9)1) 


i 


n 
mes 0-5 2 (ε[{Φξ - 8,)°1 - BL(Ws - 9,)%7) 


n 
m (g ϐ) == 2 (EL - Θ.)] - R.(8 )) 


For the remainder of the proof we delete the superscript k. Clearly 


IE 2 
&(9, e) == D NEC ο ο ο 


= #,(9, 9) * E(9, 8) + H,(9, 9) - 
2 2 
Let B= (ß- a). Then El (0, - 0.) ]«B 


2 
(9 9) < E- UE À 


2 
and Β (61) ς B ; hence 
It follows immediately from Lemma 1) that 
H,(9, 2) < O. It remains only to examine H (P, 9). 


17 








n 
1 
Bo, 8) == 2 Ella -Ύι)ίῳ, τ, - 2θι)] 
n 
2B 
a 
em Y min [σε " BI], B] 
i=k 
n ewm 
Bu L Ele, + BE¿lQ, > a,JPla, >8,) + m > Ela, Se] 
i= tak 


The desired result follows immediately. 


Corollary. If condition b) of theorem 2) is replaced by 
k 
t : + — 
τς πε (8, 2) +6 (0, 2 )1 20 
i— c 
Ππι ποπ ΙΕ 

Fae ; ; ] th 
then Q is uniformly asymptotically optimal in the k order. 
Proof: From the proof of theorem 2) it is enough to show 


lim {sup H (9°, 8)} < 0, recalling that H (P, 8) is a funetion of n. 
n- o 0 


18 








SR = k yk 
=k 


>0 uniformly in 9 as η -οο since uniform convergence 


implies uniform convergence in Cesaro mean. 


Qu ETD 
We turn now to several lemmas which will be useful in Chapter III. 
lemmas 2), 3), and 4) will be used to establish condition a) of theorem 2), 


while lemma 5) will be used in evaluating certain limits. 


The first of these is an inequality proved by Hoeffding [2], which 


we state here without proof. 


Lemma 2) If X., X,, ... , X, are independent and a S X, € b 


eee = 1, ο... , n then “yt 20 


2 
: m 
ΠΗ. ΠΠ ΞΕ » 


13 








Lemma 3) Let ο ο ος 


πο ο ; τ. ΤΕ 
n i- 


a sequence of random variables such that for some k >Q and 


Wel by 2, «+. εξ the random variables Xs Xe? X. Loy? ves are 
mutually independent. Then 
τς 2 
en 
= = 2 + 

P[|X - E[X]| >5] < 2ke es 

m 
Proof: Let Sy = X pi where m is defined as the integer such 

j=0 © 

Ty ΤΙ = 1 E 1 
that EC Ti «πετ and X, =0 for Qm Tet y. E[S l. Let 
O 
A, = event ER - y. ==. BUG 6, ls tie suo mc s παρε ο 
3 B 2 C i 


random variables and from lemma 2) we have: 





m 
: on 
P[A,] = P| D AE E ^ >| 
J=0 
m 
ile on 
- " + 1! Σι ΟΝ 7l T 
2 2 
on 
e η 
ec 
n* 52 
Εκ(αεκ) 2 
ΞΕ 9 
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50 that: 


k 
PLIE - BIR] 21 = PfI È (s, 791 280] 
En 


N 
m 
I 
HJ 
— 
a 
το 
ps 
! 
M 
N 
C 
-! 
—Á——— 


I 
Hj 
E | 
j=- 
en 
us 
EA 
A 
DI» 
Hj 
tr 
> 
pP. 
[d 


IA 
nN 
"m 
(D 


Lemma 4) Suppose for non negative random variables E 
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Lemma 5) Let F be an absolutely continuous distributior 
function with corresponding density function f. Let 


Cy = Be) <M and £f"(x) is continuous). Let 


D= (x: |£'(x)] >0). Let γεί-ο, ο). 
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Proof: Using the mean value theorem and the Taylor expansion we have 
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The proof in case ii) is similar. 
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MS results. 


We now turn to the task of finding asymptotically optimal procedures, 


bounds, and rates of convergence for specific classes of distributions. 


We shall also look at a modification of our problem in & very general 


class of 
problem. 
The 


ease its 


distributions. Finally we shall consider the "empirical Bayes" 


notation we shall develop and use is inherently cumbersome; to 


burden somewhat we shall not always indicate all possible 


dependencies and shall not always indicate one or more of the arguments 


Or sr Tunetijon. 


Prac.ıce 


Hopefully no misunderstanding will arise because of this 


A. A special discrete class cf distributions. 


We first consider a special discrete class of distributions defined 


on the non negative integers as follows: 


where: 


i) 


i) 
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P[X = x|6] = p„(x) = θ᾽π(θ)ε(χ) 2 = 01 e 
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iv) 


If B <1 then there exists a constant b such that 
g(x) < x? for all but finitely many integers x. 


PE poc then there exists a constant b such that 





μι. -5 for all but finitely many integers x. 
x 


All of these restrictions are quite mild. The third prevents gœ from 


oscillating wildly as its argument progresses through the integers. The 


second and fourth conditions restrict slightly the rate at which 


e*e(x) 290 as xw. 


Examples of such a class are: 


Type 
Poisson 


geometric 


negative 
binomial 


The conditions 


Recalling that x 
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a assumed known x 





are easily seen to be satisfied. 


j = (% χμ]; ... 3 x.) E) = k, k + l, ... 
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O otherwise 
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Theorem 3) For the problem defined in this section 
pe = (9, Pas ec ) is uniformly asymptotically optimal of ka 
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erder tor k= 1, 2, *** , and eo 20) = Bk, n) = 28 
Proof: We shall show the conditions of Theorem 2) are satisfied. 


Fix k. Fix 6. 


Recall Q,(x,) = em i are] p d Ῥρ Im i= ky Le 


We observe that there exists a set Ri in k dimensional space such 
that P[X eR] =] and x, ER, > είκ)θι(αι) > 0. We then have 
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We are now in a position to apply lemma 4) to obtain the functions 
51 and 2 for condition a) of theorem 2). If we substitute in 


lemma ^) 
k 
H} = g(x, )Q,(x,, eo ee 1) 


Uy = BOG + 1)Q, (x) 


Mene have y i = 2k, 2k + 1, ... 
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We have now produced the inequalities for condition a) of theorem 2). 


W 3 O | ] j 
e must now choose functions δ.(κ., θι), ΠΕ subject to the 


conditions that 
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k g(x,) 
ο ποιο ο ον 


τ ον that for Qi and some a, condition b) of theorem 2) is 
satisfied. 


We first prove the following lemma. 
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Proof; We first consider the case k= 1. Let d be the smallest 
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easily verified fact that h is a decreasing function. 
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If B >1 then we have from condition iv) that 
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Let M, be the smallest integer such that M; AL ΠΡΟΣ 
7 M 
7271. Then P “n( 0) @(M, ) < 1; and hence m, <M 
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where m. is as defined for the case k = 1. We shall now consider the 
two bracketed parts of the right hand side of the last equation separately. 
The second bracketed expression is clearly seen to be less than or equal 
το M*n* log n for some M* «o by the argument used in the case k= l. 
The first bracketed expression can now be broken up into k expressions, 


k - 1 of which are less than or equal to MXn* log n with the remaining 


expression being 


m.-l m.-1 : 

n T i a k 2 
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same arguments as in the case k= l, for those (k-1)-tuples 
(κι, s zu such that the indicator function is not zero for 
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Upon combining the above result with the previous k inequalities we 


have the desired conclusion, and the proof of the lemma is complete. 
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We turn now to the task of showing condition b) of theorem 2) is 
satisfied for a suitable choice of Es and SE The theorem will 
tren give us an upper bound for eg‘, 2) and we shall then see it 


Pasa hne Claimed rate of convergence to zero. Recalling 
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Using the definitions of or and €, we have 


S = Uk exp 


- Q, log i 2 HE - k +1) er | 
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ale 
whenever E » Thus 
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Collecting terms we see that condition b) is satisfied and 
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We have now given an upper bound for e*(g, 9) tor all n, and it 


remains to find the rate at which this bound goes to zero. Examining 


the various components of the upper bound we have 
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Hence εί RED 


uniformly in ®. 
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Q.E.D. 


We observe that the sum of r independent identically distributed 
random variables, each with density function Py (x) = 0 n(0)g(x), has 
the density function foly) = n E T where αἱ Ὁ) is the 
r-fold convolution of g. Thus the density function of the sum has 
the same form, and if conditions ii), iii), and iv) are satisfied for 
gy) then theorem 3) may be used even if the original problem is 
modified to allow r independent observations for each 0, observing 


that the sum is sufficient for 0,. For the geometric, negative 


(r) 


binomial, and Poisson families g satisfies the conditions. 
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B, the modified negative binomial distribution. 


atx-] 
A 
||== 


X 


X 
Let P[X = x]6] = p(x) = 





=: 
a + 6 








ο ο ο xe a 


This reparameterization of the negative binomial is of interest for two 
reasons. First Y a,E[X|6] - 0 unlike the usual parameterization. 
Secondly the form of the decision procedure is different than that 
usually encountered. 

In this and the following sections we shall not give as many 
details as in Section A. The method of proving asymptotic optimality is 


Similar in each of these sections, and may be summarized as follows: 


Q(x, ) 


k x ; 
Since v.(x,) ue when Q, (x, ) Oy ve a ES an estimator 
k ΠΕΝ 
. . τι M ` 
9. (x) which is for "most" x, equal to a ratio Bix)? such that 
P e το v yk 
— = E — 

EMT XIX, al ο ο) and such that a ο» | 


=. κα)» Then using the methods of Section A the functions £, 
and €, may be obtained for condition a) ton theorem! 2) seelun1asinem 


only necessary to show either condition b) or b') holds. 


41 





Let: 
. k 
x XE 





Y G3) B 
O otherwise 
a E 
a | D = O, im 
T 
Sour deeem mn TT 
| E A 
gx; t). if there exists t e Ll, 9), v: Suchen, 
x = (χι; coe ο Xp x. + b») 
AQ) = 
ο otherwise 
il 
2 Y. (2) 


ES 
e e k + 1 
; Qa? 
P) if ee > QO 
aes 
ο otherwise 


42 








k Jee k 
E * i - Š 
9,(X,) Ου X Je Be ee 
k 
B oe wie B A ee 
. e Φ LJ k k k LJ 
Using the above definitions we shall show that qQ 5 (9, ; Pos ses ) is 
: : th gk k 
uniformly asymptotically optimal of k order and „(2 ο οκ. 
BR log” n 
2 i 
Recall Q(x) -------- Σ p = 
= = + θ 
i-k’ “f-k+1 en ο 


i 
G(s ) =- ce Ῥ (x) 
ο πρ ο. ο 


We observe there exists a set Ri of k dimensional vectors such that 
P[X*cR.] cand x cR az) LOO ERSTES INS ' ο and X,€R, 


qx) 


k . Be 
Vig) p HER » For i? 2k, X ER,» j = K; ose , i- k we have 


pP j-k+£ ο 
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We also have 


oo k-l 
E[Z.DIXP- xl- Y aa Op (x, * Ὁ) Lor ep 














a j-k+é 
But 
οο oo a x = l - ch 8 xtt 
κ g(x, t)p (x + t) = e 8 . a a zi 
2 e 
= ap.(x) à», [στ 7 
P ee 
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Hence: 
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fe. Ole 1 : ieee 
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The remainder of the proof follows that in Section A). 
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Cr The binomial distribution. 


Let P{X = x|9] = po( x) = cnm - 9)* «20,1, ... , a where 
a is a known positive integer and O<9<1. For this family it is 
necessary to modify slightly the definition of asymptotic optimality. 
Robbins [5] and others have demonstrated why this modification is neces- 
sary. Let RA) be the ο. standard as defined in definition 3) 


with the parameter value a. We shall develop a procedure g“ such that 


y ο. Ro, 8) - Ry a1 99) 1 EE 


Such a procedure will be said to have property c). In addition we shall 
show R (g^, 9) - Rk, a ΠΡ ) < Blk, u) = pers d uniformly in ®. 

We shall first exhibit a procedure having property c). We shall 
then introduce a new procedure which not only has property c) but Tor 
most O9 actually improves upon the original procedure at each stage and 
produces strict inequality in equation c). 

We first assume that corresponding to every observation X we have 
available the related observation X' which would have resulted had we 
observed a binomial random variable with parameter a- l. For example, 
if X is the number of successes in a independent Bernoulli trials 
With probability 60 of success, then X' is the number of successes in 


the first a- 1 of these a trials. While in most situations this 


assumption will hold, we shall see later that it will not be needed. 
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Xi o jog qe" 


it 


1 it <A, ο) j= Ky Ieee mee 


We shall now show pe = Cr Pas ... ) has property c). We shall 
proceed as in the previous examples, noting that theorem 2) is still true 


when the property of asymptotic optimality is replaced by property c). 


wee ek let: 


1 TERE 
earal 
i IKE AL ik 1 


R. be a set of k dimensional vectors such that 
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ἃ - l 
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Thus arguing as in Section A) 
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We now look for an upper bound for the quantity 


n IT E 
» P | D Ll Po Ce < o We shall show an upper bound is 
i=k J=k £=1 j-krl 
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a subsequence of integers (ij) (possibly finite) such that 


i 
ῇ Pa (x) <ites i- i for some v - 1, 2, ... . Let 
J=k [-1 j-k-Z 
i ο ου for all mec there exists a e such that 


EM sn v« ν and: 
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» [y [n αν), SD xo p 


ik ick del kr 221 Fi-kts 
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MA k 
< Il p Ben « n* 
το ο £ Mur 
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Since there are (a + D different Xy to sum over, we have shown 


the claimed upper bound holds. 
It is now easy to show, using an argument similar to that in 


Section A, that an upper bound for R (9, 2) is 
n 
DI, à log 
pa We um z ο - 
n ας k) n 2 Uh 
ο 


n 
+ Bk » exp |i log i D zs 2li -k +1) = +1) ee 7 
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and cleariy this upper bound is uniformly of the order D » The 


gesired conclusion follows. i 

In the above procedure we chose at times to neglect the results of 
the D trial" in many of the observations. This choice of which 
information to neglect was quite arbitrary, and it is easily seen that 
the above proof does not depend on which trial's information was neglected. 
We may thus conclude that if in some situation the related observation 
X! is not obtainable, we may construct a new X! which will do as well. 


An example will illustrate. Suppose a= 17 and X= 10. With the aid 


of some random device we let 


9 with probability 10/17 
χι 


10 “wath ρε ρε νι πι 


This ΧΙ will work as well as the original X'. 


mu : k 
We shall now exhibit a procedure @ which improves upon 9. At 


a 
the en stage of the decision problem we could have defined Φ. in 


any one of several different ways, depending on what information we 


chose to neglect. For fixed i >k and Xy there are ti ways to 


hake. 
define xt and hence Y). Thus there are & (4 ) ways to 


z 
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ee k-1 
ΠΕ Ρι(κι). Similarly there are a ways to define zx) and 


hence Q,U-1) (Gi) 


„(2k-1)(i-k+1) 


ways to define P,(x,)- Thus there are 
ways to define ο, From the above, and observing 


the x used in PX(X* ο) could have 2 dif erent derini tions, IWE 
2 


(2k-1)(1-k+1)+k 


see there are at least a ways to define the random 


variable py (X,)- Most of these different definitions will result in 
essentially the same estimator for large i. We may obtain an improved 


procedure, however, by considering some of them. 


We define a u = l; sss; 2s as follows: Let u- l, ... , at 
, (u) 


—1 
be an indexing of the eu distinct k-tuples each of whose elements are 


integers from the set (1, 2, ... , a}. Let the ee k-tuple be 


(2) | 
Du Ue. y ὃν κ)» Let X £ =1, ... , a be the random variable 
th 


derived from X by not counting the result of the 4 trial. Hon 


(t 


x4) equals the previously defined X'. We now define 


(o) (ty) (5 κ) 


u,k 
X (u) to be the random vector (X. x DELE ye 


example, 


u,l 


-kt] 7 a 


ED of (X.) = p(x“ (u)? EE ος χω κι i SS πη 


l a E 
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Let: R(p, 9,) = El(p(X,) - n 


k 
a 


~k 1 k 
ΤΠ 
a vel 


7R KS E 
φ = (Py, Pr ... ) 


We shall now show V6, yi πίφ:, 9,) < πίφ!, ο where 9; is 
as previously defined. We first observe that the ON m are 
2 
identically distributed as 9; (X,). Then, supressing the superscript k, 


we have 


i 


R(9,, 6.) = E[(9,(X,) - 0,1% 


ma = 2 
eLp 29 Ero] Eor 


a 20 T 
i 5 2 i 2 
c Blo, J+2 ) Eo, 9. -= » Elo, 1+ 6 
„ek ucl as Un E ὑπ a 1 
k 
2 s 2 
- alto, 09% - [0101 - E E oa Oi) 
a GNIS y id e 


A A p. J] +E] 
Pis Zi Ok i,u isu "i,v iv 
a UEV 


T 2 
R(9;, 2i) ee. D E p 9, ο) 
a u<v 


= R(Q.; e.) 


we notice strict inequality holds unless P[ọ, (X.) = ọ, (X.)] = 1 


Pa u y 1<u,v<s a Thus we have strict inequality holding 


de 








unless a = l or unless ΕΑ is composed of 1's and O's. To 


investigate the asymptotic properties of 9 we observe: 


lim 3 R(9,, 2) - XNCAT 


n- 0ο 


- Tm {2 b DG, 8) -n(es 201 + Σ πίφι, 6) Bo 18) 


Ir 0 i=] 


ES ip Ξ T [R(9,, E) - Ἀ(φ,, 8 ER), since q has property c) 


I 


B 


------ Al y u > 2 
lim 4- = = El(p, _ - e, _) i} 
ee ae Pair 


n © 


ο 


Hence Q has property ee In order for strict inequality to hold it is 
sufficient that there exists e >0O such that, with the possible 


B c: 


exception of a finite number of values of i, Zelle; „ = Piv 2 


If a= 1 this condition is never satisfied. E a > 1, however, and 
for a large class of 0 such an € will exist. Let Q* be the set 
of 8 such that Y 00% there exist SE Es such that 

ο Er < 6, E 7 <1 for all but finitely many i and such that the 
first order empirical distribution function of 8, does not tend in 
the limit to the distribution function of a degenerate random variable. 


It may then be shown that a >1 060% implies 


> EL, | a mos >€>0 for all except possibly a finite number 
ux y 


or T 


p 








Since the sum of independent identically distributed binomial 
random variables is again a binomial random variable, and since the sum 
is a sufficient statistic for 60, it is clear the methods of this 
section can be applied to the case of r independent observations for 


each 6,. 
i 


D: The normal distribution 





x 
Let EN x| 9] = | v9(t)at 
a OO 
X : aio) 
3i e CPUS -0 < X < œ 
nn ven σ 


for go >O and -~<a<O0<B <e. 


For the present we assume σ is known, although later we shall modify 
this assumption somewhat. As we shall see, the estimation procedure 
in the continuous case is similar to that in the discrete case. Without 
NOSSEOT generality we take g = l. We fix k => l. 

Let {c., i=l, ϐ, ο. ) be a sequence of positive numbers such 


i 
that Lim αι log i= 0 and lim 1(e,)* 080) = 0 ο 


i o i e 
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£y. be the k dimensional vector consisting of zeros for 


all components except for the ee which IS egual to one. 


ETT ee PET [T ον. -- 
1,10 7 
O otherwise 
i-k 
2 Y, 1 
Py) 7 = k 
(i -k + Inch) 
BÉ E (Y, + eue) o fu. 7 048) 
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e) | 
xe EG) IE ο ο >0 and i =k, k + ieee 
Ber - 
Yy otherwise 
α in P#(X*) a 


e <P 
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B if wea P*(X7) 


We shall now prove the decision procedure g = Co 9 UU E C 


uniformly asymptotically optimal of Ἐπ order. 
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We shall show the conditions of the corollary to theorem. 2) are 
satisfied. 
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Let: m Cy, ) = 
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Mi 2 2 
η (vi; κε) 2 (y,-8 κερὶ 
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i-k | 51 , - 2 [53 - | 
e ae 


z(y) = HE TO I Se 


- 4 
for ee such that yp Ci SVG 3 S Y, C: 


L=1, 2... y ko What particular yr 3 is intended 
3 


will be clear from the context. 


Then; 
y ,t*c 
e 
= Be! 
m (y) = - rl N Ze, s e jet! dt 
Si en) 
gc 1 κ 1 a 1 Ec Mo 


NO 


a. .(y.) 
i-k 4k 
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where the ia 3 in z, are those vectors whose components 
Es 


arise from use of the mean value theorem. 


Now: ox Pk, ek-t Lo v9 y y, ER”, Ge Q^ 
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li -k+1 i-k+1 k 
<P eB ey) +  - [zi tt) 


but from lemma 3) this probability is 


(i-k+1 k e 2k 
< 2k exp {- 2 F (5, - EM er 1) Mo 
2(ktl) 2k ek+1,i - k + 1, 2k 2 
< 2k exp {2 C δι - 2 a θ (δ. = EMD 
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In a similar manner: 
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yo 1 Qe τει ΤΕ NIS 
di Y. T7 3Y- x41 a i-k Xx]: 
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Hence: 


Q! (Xs) 
3 ea -I rm > ES J Ya 


CE 2 (2c, ) e (1) 
sax exp { - AERA Me, = Jail = lal poe) aE} 


< 2k exp (CA = Qe os Ae, - [21] - la, DÊ}. 


᾿ k 
provided Ey 7 |2:| - la, | πα η 20 


Thus, using an argument similar to that used in proving lemma 4) and 


letting B= max[|a|, |6|], we have: 


i-ktl κ - 

E ΠΠ a vi) ατα“ WIE yyl) IX j y| 
< 2k exp {εδ es, - ον e a E EM e} 

+ 2k exp {(20,)° Me, - pe Ξ πετ... - |z: | = la, ê} 
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To complete the proof it is sufficient to show that for some functions 


k 

E 2) and o (y) 81)» such that either ει > |2] + la, | To E 
; k _ | 
or €, = O and either δι > EN Y Or δι = 0, the following 


limits hold uniformly in 6. 


1) lim picti eG. E 


u q, rj) 


ii) lim pic eta (B + lx, De, 085) | = 0 


io E Q (X) 


iii) lim E 


i- o 


ter ae EDI : Ja] 2 EE 
ee a ee ee 


[AE I? 


iv) lim E 
i o 


1 e k = 
Es τ Ὠ EM ES (4 mum πὶ 0 
κ 
ao- og 


i œ 


vi) lim DIS : Res) - |a,(x,)| c =Q 


i- o 
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Let 


Q. Q. 
: > - if = a k 
(i -kt 1}1ορ i i=- kt ijlog i=" i (1 en m" 
DE 
i 
ο otherwise 
Q Q. 
i E k 
(i -k+ 1}1ορ i Ed i - k + 1)log ΠῚ i EN Y (i-e 1) 
€1 7 


O otherwise 


Since there exists U< such that E[lxl] « U for all θεΏ, the first 
two limits hold uniformly. The third and fourth also hold, recalling 


lim y HL) = 0, 


i © 
Since y NY >O there exists S()) <% such that P[ |x| >s] <7 for 


all θεῶ , we have: 
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do, - I cdm > 12] 
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(i-k+ e i-k+4 ] 
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Clearly the fifth limit will hold if we show lim |z ¿(731108 i=0 
i o 

uniformly in 0N and ly y! «S8 £71, ... , k. By a similar argument, 
to prove the sixth limit holds we need only to show 

! i = 0 
Lim (124 (4) la, (y,)1)108 i = O uniformly for %0e SN and ly, | «S 
g = l, e? 9 3 Ke 
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IA 


Iy, - yglly, + yj - 20, κερὶ 


IA 


c,(28 * 2B * c.) for all 9 εκ... 


j-k+L 


c y 
Now lx <i>|1- e] =] 5 El <lx] (e - 1). 
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clently large. 


But since lim c, log i s O we have shown lim Es ,| 108 = Order sa 
1 ο 1-2 οο 


9 and ly, SS £21, ... , k. 

We now consider lim |z!|log i. This limit is more difficult to 

co 

evaluate. By E s argument leading to the consideration of this 
‘limit it is clear that we need only show lim | 23 | Log i= Ofor y, such that 
[Υρ| oo RL, νο, k and Y, ο) dor j=zk, kt 1, .-- «© We 
shall use lemma 5). It is clear that there exists M <œ such that 
σα - (ο, ο) for all 9eN. ‘Thus we have for '. πα Ya such 


that lypl SS and y, # 61 joe 35 MERO I OEC 1S. 
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We shall now consider various parts of the right hand side of the above 


inequality. 


Vj =k, «oe , i and i sufficiently large: 


a) 


b) 
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Let = 
51.4 e yl 
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f) Let 5g y =P y * 95,4 7 1 


' i θα {Θ᾽ 
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Using the above six results we have for sufficiently large i: 
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S M*e. where M* is some finite constant independent of ®, i, 
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Hence Lim |z1| 10g i= O uniformly in @ and Y, such that Ix, τ S 
i- o 
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.. 5 πι ο 

a= + - —— ræ + 

. Sich ο y. 2c 8, ~ 26 γι. 20,9 ,) 
a a Sa a i ο E a — am + 9 + ο 

and Bc. Y», 3 S 


Beste, 
i 


Hence la, (7, | zen - 1) for all 8, y, such that Ix, s 8; 


and lim la, (y,,)| Log i= 0 uniformly in 9 as desired. 

io 

Ihis completes the proof that the decision procedure g“ is 
uniformly asymptotically optimal of κ. order. g was defined for the 
case og = l. If we had kept arbitrary co then g“ would have been 


defined in the same manner except that gi.) would have been defined 


2 
+ = g 

E | | 

as m a ee If we relax the assumption o 
E 

known to the assumption o unknown but equal for all observations, then 
it may be shown that if ος is an estimate which converges in probability 
το ο uniformly in 60, we may replace σα with ο in the definition 
of 3 and the resulting decision procedure is still uniformly 
asymptotically optimal. 

If the problem is modified to allow r independent observations 
for each 01» then since the sum of these r observations is sufficient 
for 0, and also normally distributed, the above procedure will still 
apply. We note in this case that if the common variance is unknown, 
then for each i the usual estimate δ΄ is independent of O, and 
NE o 2 
π A o. is a consistent estimate for o. 


I i 
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E. The gamma distribution. 
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Let P[X « x|0] - | gas 
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We assume a is known, and fix k > 1. 





Ο τ: 


k, k + 1, 


Let (ei), Sg Y 4,47 f,, and g, be defined as in Section D. 
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= y z fy.) Jp ey) Oz and An 
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x otherwise 
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β if B < Px(x*) 
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ee k κ k πι’ 
We shall prove the decision procedure 9 = (9; Pa» oe. ) AS 


uniformly asymptotically optimal of oo order. 
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We shall show the conditions of the corollary to theorem 2) are 
satisfied. Since much of the argument is similar to that in the normal 


case we shall omit many of the intermediate steps. 
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To complete the proof it is sufficient to show that, for appropriate 


91 and € i the limits 1), iii), iv), v) and vi) listed in Section D 


hold uniformly in 60, and that 
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Then limits i), iii), and iv) clearly hold. Since E[X] < 
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We shall now consider various parts of the preceding inequality for some 
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where M is some finite constant independent of 9, y, or c. Using 
parts 4), e) and slight modifications of h) in the proof of the previous 


limit we have: 
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< cM as desired. This completes the proof that the decision pro- 
om ; i th 
cedure Q9 is uniformly asymptotically optimal of k order. 

We note, as in the previous sections, that if the problem is 
modified to ailow r independent observations for each Zr then the 
sum of these observations may be used to obtain an asymptotically 
optimal decision procedure of nm Order. 


We now consider the case in which the other parameter is unknown, 


A 8-1 -Àx 
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that is Py (x) 7 T(5) * for known A. It may be shown that if 
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and all other definitions in this section are unchanged, then the 


resulting g“ is uniformly asymptotically optimal of p order. 
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lu The non-parametric case. 


We now consider the following probiem. Let 3 = (pal): θε Q ) 
be a class of probabiiity mass functions, each of which assigns proba- 
bility one to a specified denumerable class & = {x} of real numbers. 
% is an arbitrary index set. We assume for each O in Q that Pal -) 
is completely specified. Let h(*) be areal valued function on 3. 
Let (6) = Efhn(X)|0]. We assume Ein (x)je] <B <o for all ® in 
Q. For some unknown ος N we observe r+ 1 independent identically 


distributed random variables X X wich PIX, X. x | 
2 


are 
= pg (x) S= lpo p ril xe%. We wish to estimate ACO.) on the 
E. of these observations. For example, if h(x) = x then we are 
estimating E[x|/@]. If 9. is our estimate we suffer a loss of 
(o, = ran We now assume we are faced with a sequence of such 
decisions. In other words a sequence 104: Jc Ec 5 
selected from qu For each = we have r + 1 observations and we 
may use Es to estimate 9! where X. is the Jj X (rri) matrix of 
observations (X, ον): Johns [3] has considered this problem under the 
assumption that each £ is an independent observation of an U-valued 
random variable © with unknown a priori probability measure G 
defined over a Suitable o-algebra of subsets of Q. We shail consider 
the case in which the sequence (6 4) is arbitrarily chosen. 

As in the previous cases we need a standard to use in evaluating 
a particular decision procedure. For any ee a” we form the nn 
order empirical probability measure α΄ such that for any sets 


Q 220 in the o-algebra, 
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Eno with ΕΝ having any k-dimensional probability measure (Se and 


9 dependen On O, : es Ok? then the Bayes estimate for 

M8.) is Ε[λ{Θ.) [ΧΠ and the Bayes risk is 

R Ea) = E([A(0 ) - E[A(O I where xX is the k X (r+1) 
EtL m mesm ? =m 


matrix consisting of the last k rows of X a and the subscript rtl 
in the Bayes risk refers to the number of observations for each parameter 
value. We now take as our standard R (9) =R ο), and seek a 

kor =N PN 


procedure g* such that 


F) Tim {sup 2 Σ Bilo (X) ΠΤ R 8 IT : 
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n> »\ © 2 fs 


We observe that R ) is not a desirable standard since if 3 is 
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m 
the class of binomial densities, for example, then, as mentioned eariier, 
R  ..(9 ) could not be achieved. 
E οι 
We observe that theorems 1) and 2) are still valid in this case 
when Re On! is substituted for Do and property F) for the 
property of uniformly asymptotically optimal of n order. 
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ον - ; j where Bud is an arbitrary real number. 
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Aa) LA m(A*) to be the m( A“) distinct matrices 
obtained from Ae by independently permuting the elements 
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We shall prove 9 = (9, φο» »»' ) has property F) provided the 


following condition on Y is satisfied. we O<ec<l 
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condition is satisfied, for example, if 0e N xeg — pg(x) 2» n(x) ο 
Since in this case 


j^» 


(Xs sag) = “| 


ime 
Na 


cz Es -k+£ 


m r ε 

; m Lud £=1 I "e κω d πας, ; | 
k 

ΙΝ ες a Urt) 


ᾗ 
Ῥ es ) 
a ους 
DEA p 
where, as before, I(a, b) = 


O otherwise . 


But y 5 >0 there exists a set da c & such that Es has only a 


finite number Νε of elements and P[xez,] »] - ô. Hence: 
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(3,) χεᾶς 


Since ὃ was arbitrary the result quickly follows. This is not the 
only case, however, in which the condition is satisfied, as was seen in 
sections A and C. 

The proof that φ᾽ has property F) follows the same general lines 


as 3n our other examples. V i-k,k- 1, 
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Then for Aver, we have that one version of 
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Hence, arguing as in Section A, 
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Since Z. is not bounded we are unable to use lemma 3). We can, however, 
use a simple Chebyshev bound, observing that Var[z (A“)] 
ES ενα, 4121 S Bs and hence using an argument similar to that in the 
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proof of lemma 3) we have: 
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1/4 


l 
I ED El 
Q; log i m. y l/s 
πα 12 (1 - k + 1)log 1 
5. = 


ο otherwise 


n 
a 
. 3» 
Then clearly lim = ) Elė, + A > aj] ΞΟ uniformly in 0 


n0 j=k A 


Since our assumed condition assures that lim E » P[Q, < a1 Ξ 0 
n> 0 i=k 


uniformly in 6, theorem 2) is satisfied, and g“ has 
property F). 

We observe that in this case, as in the binomial, the cholce of which 
information to neglect at the ie stage was arbitrary. In particular 
Kr could have been defined in any one of (r + DA ways. We thus 
could have defined (r + De essentially different estimators Ps ow 


each of which would have the desired properties. As in the binomial case 


E (r+1)k 
it may be shown that 9, wer = Un, is an improved estimate. 
rt tre 2 
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We note that if 3 had been defined as a class of absolutely 
continuous distribution functions, a similar decision procedure could 
have been derived. As in the normal and gamma examples, a sequence 
(c, ] would allow us to treat this continuous case as we did the discrete 


. 


case, using lemma 5) to show the appropriate limits hold. 
Ge The empirical Bayes problem. 


We now consider a modification of our decision problem in which the 
sequence (6. is not an arbitrary sequence, but is instead a sequence 
of observations of random variables. If these random variables are 
independent and identically distributed then the problem has been called 
the empirical Bayes problem. Many fine articles have been written on 
this problem and the results obtained have inspired this paper. We 
shall here, however, consider a more general form of the problem. 
Instead of assuming the 9, to be independent observations of a random 
variable ©, we assume the sequence (0. } to be a realization of a 
stochastic process (0. : πο . η. a whiten ds strictly ctapienasdd 
of order k. In other words for any k positive integers 


1, 15, Tore Lh. and any positive integer j the k dimensional 
random vectors (0, , 0, , ... , 8, ) and (68. . TE ο) 
+ air 
i. Tk EM x 
are identically distributed. In particular, we suppose that 


An 


A RRR l; e. the vector (6. 


μη IM 8,) has distribution 


function α (γι). Thus if G^(y, ) = G(y y) for some G we would 


have the standard empirical Bayes case. 
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If τ᾽ were known and if 9, were distributed independently of 


(8. , De then the standard Bayes argument would yield 


i-k) 
E ΕΘ, |ΧῚ as an estimate for 9, which minimizes the expected loss 


and achieves the Bayes risk R(G ). Even if 9. were not distributed 


independently of (0 Ops eee y D A might still be & "good" 


1? 
estimate, and the risk R(G*) a reasonable risk to attain. We shall 
show that any procedure which is asymptotically optimal of nee order 
(derived under the assumption of an arbitrary 8) will also achieve 


asymptotically an average risk less than or equal to πα. To be more 


precise, we shall show the following: 
Let: (à be a bounded interval of the real line. 


(O, > i= 1], 2, ».. ) be a strictly stationary stochastic process 


of order k. 


O E) 


en be the joint disvuribuvion funegien or (8. , B 


& be the class of all possible sequences of distribution functions 


(G^; ο o S πι G? is the n dimensional 


+ 
marginal distribution obtained from ce =. e satisfies the 


above definitions, and g” puts probability one on On for 


adu m 


en 





R(Q, ; G^) be the risk of using the estimate p; for 6, when 


the vector 9: is distributed according to em 
This risk depends, of course, on the class 


| (Polo): θεΏ). 


n , : 
a R(Q,; G^) where G is the i dimensional 
= 

marginal distribution obtained from 


ean 


We now state and prove a generalization of a theorem by Samuel [11]. 


Theorem 4) Let 3 = (pol): θεὰ ] be a class of distribution functions. 


k 
let 9 be a decision procedure which is asymptotically optimal of 


md order form. . Then 


k 


E, G") « R(G*) 


lim  R(9 


n 2» 


for all {G@’Je4. If q" is uniformly asymptotically optimal then the 


above inequality becomes 


lim 1 sup [R(g5, G") - n(*) If < ο 
απ 99 (P jek 
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For the remainder of the proof we shall let E,[:] represent expectation 
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ο τοσο ο οκ. 


IH ee 9. have & priori distribution function 


k “ 
G , and ESA represent expectation where ©, ,,,, +...» O, have 
4 . e. “ . th 2 Φ 4 a 
om riori distribution Function at, the K order empirical distri- 


2. » 0. We now let A(x’) 
n 


bution function generated by ϐ O be 
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a form of E Lo, [X7] and es be a form of E Le, [x]. Then A 


2 
achieves the risk Be and Y the risk R (80). We observe v 2. 


2 BEIM Ee Η 
,L (AQ) - 65)" 10, = 8.1 - ESL OM) 9) (9, * 8,1 


We call this common value L(9 Then 


"E 
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But since g“ is asymptotically optimal we have 


ex fl 
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n> 0 1 


E k 
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and hence, since our losses are bounded, 
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=> Tim Ro, α) - se) Or 
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The proof of the second part of the theorem follows immediately. 
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Corollary. 


If in addition to the assumptions of theorem 4) we add the condition 


mau ey) = kei. KK twee. E is distributed independently of the 
vector p then the two conclusions of the theorem may be replaced 
by 
k n k 
lim R(Q» G) -R(G) 
n> © 
and 
. k on k 
lim R(Q G) = R(G) uniformly for (G Je S 
n o 
respectively. 
ΘΟΕ: 


To prove both parts of the corollary it is sufficient to show 


lim Rías, ο] οσο]. But since 6. is independent of 809,,, πα) 
a J a 

is the minimum risk that can be attained by any estimate of EAN Hence 
RCO, ο) = R(G5) for all i >k. It may be shown that 


¡<x=>Rr(0) >R(0%) so that R(g,, G*) >R(G*) >R(GN) for all 


i<k. Thus R(9, G”) >R(G%) forall n so that 


lim R(g , GP) > R(C") 


n> oo 


as desired. 


Qus D. 
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